I wanted to improve upon the the figure found in Figure 4 of the Koren, et al. 2012 study. Although it is possible to infer that the microbiota is getting more similar between children and their mothers, this is not easily interpretable from the figure that is presented. It also required matched child and mother sample pairs. I wanted to specifically ask the question whether the phenomenon described in the Koren, et al. 2012 study was generalizable across groups (i.e. did not need matching pairs). Secondly, I wanted to know if I could identify what was important for the model classifications being made as to whether the microbiota was an adult mother or not.
In order to do this I choose to use the Random Forest (RF) machine learning algorithm. The benefits of the model are that it takes into account the interdependency of the otus, it does not need data to be normal, and it can deal with 0 inflated data without too much problems. This makes it superior to general linear models and feature selection algorithms, such as LEfSe.
The prediction function that I used in R (v3.4.1) returns a probability of whether or not the sample is an adult mother. This data is unlikely to be normally distributed so to be conservative when testing for significance I used the Wilcoxson Rank Sum Test. To correct for multiple comparisons I used the Benjamini-Hochberg method as opposed to the Bonferroni correction since the latter can be overly stringent and result in more false negatives. All P-values that are reported herein are those that have been corrected for multiple comparisons.
Figure 1: Transition of Bacterial Community from Child to Adult